About Slack data collection
You can collect Slack data. You need to upload the Slack data to DataHub before collecting it using Collect>Slack. When you collect, you can select the data types (Direct Messages, Private Channels, Public Channels, and so on) to process from the selected Slack file. You can filter messages by a specified date range and split them into separate documents by days, weeks, or months.
When collecting the Slack data folder, make sure the folder contains users.json, org_users.json, or integration_logs.json. The system uses these files to identify the folder as a valid Slack folder. The users.json and org_users.json files are required, as they identify the individuals associated with each conversation. Also, the following JSON files determine conversation types.
-
channels.json for public channel messages
-
groups.json for private channel messages
-
mpims.json for group direct messages
-
canvases.json for canvases
-
lists.json for lists
-
dms.json for direct messages
We recommend that you do not modify any data generated through the standard Slack data export. You should provide all JSON files.
If Slack displays a lock, it is not enabled. To enable this feature, contact your Epiq Discover Administrator.
After processing, view the messages from a chat in an HTML file. When you split the messages into separate documents, the system creates a separate document (HTML segment) for each group.
When you reprocess the same Slack data without promoting the previously generated HTML file, the system processes these files as follows.
-
For the same custodian, the system does not create the HTML file unless the previously generated file was promoted. However, for a different custodian, the system creates a new HTML file.
-
If any messages in a private or public channel were edited, the system generates a supplementary HTML file showing both the original and edited messages. This file is linked as a child document to the original HTML file. The name of the supplementary file includes the original HTML file name, followed by _SUPP and a number indicating the supplement version.
-
If new messages were added to any chat, the system creates a new HTML file containing only the newly added messages.
-
When a custodian processes data in stages, for example, first for January and later for January and February, the system recognizes the overlap. In such cases, the previously processed January data is excluded from the new HTML file. If the HTML file for January was already promoted, the new HTML file includes both January and February data.
Emojis, emoticons, and GIFs (Giphy) appear in the HTML file, but the system does not extract these elements as attachments. The following settings in Project Settings>Processing do not apply to Slack HTML files. However, they do apply to Canvas and List files when they are not shared through chat.
-
Embedded Images Extraction
-
Excluded File Extensions
-
Text Extraction Size
-
Enable MIME Type Extraction
Additionally, all of the above settings except Excluded File Extensions are applicable to attachments.
HTML file identification
The chat HTML file name includes the following components.
-
Initials of the custodian’s last and first name.
-
Chat name
-
Thread ID
-
Letter “W” (for weekly segmentation)
-
Either the month (for monthly segmentation) or the start date.
The chat HTML file header displays the following information.
-
Type of chat. Public Channel, Direct Message, Private Channel, or Group Direct Message.
-
Name of chat. Channel name or group chat name.
-
Message group by information. Day, Week, or Month
-
Date/month. Shows the start date of the weekly segmentation, specified date for daily segmentation, and month for monthly segmentation.
-
Name of participants
Relativity Short Message Format (RSMF) file generation
When processing Slack data, the system generates an RSMF file for each HTML file to use in Relativity. You can transfer these files from Epiq Discover to Relativity, but you cannot view them in Relativity.
The maximum size of an RSMF file is 200 MB or 10,000 events. If an RSMF file exceeds this limit, the system splits it based on conversation threads and time intervals.
Related topics
The following list provides related topics.